Goto

Collaborating Authors

 density-based spatial clustering


Clustering Geolocation Data in Python using DBSCAN and K-Means

#artificialintelligence

Clustering is a technique of dividing the population or data points, grouping them into different clusters on the basis of similarity and dissimilarity between them. It's helps in determining the intrinsic group among the unlabeled data points. In this project we will be using Taxi dataset ( can be downloaded from Kaggle) and perform clustering Geolocation Data using K-Means and demostrate how to use DBSCAN Density-Based Spatial Clustering of Applications with Noise (DBSCAN) which discovers clusters of different shapes and sizes from data containing noise and outliers and HDBSCAN -- Hierarchical Density-Based Spatial Clustering of Applications with Noise which performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. Folium makes it easy to visualize data that's been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys.